Background: In the rapidly evolving field of artificial intelligence (AI), ChatGPT 4.0 and Gemini Advanced have emerged as significant tools with diverse applications, including healthcare. This study evaluates their efficacy in diagnosing hematologic diseases using pathology images, highlighting the potential of AI to augment medical diagnostics.

Methods: We utilized reference cases from the American Society of Hematology (ASH) Image Bank, which includes peer-reviewed hematologic images and texts for educational purposes. To accommodate AI constraints on image processing, we merged the images for each case into a single file. We replicated all the cases from the ASH reference cases, excluding those with insufficient clinical details, which made it difficult to approach them in two steps. Each case was run in separate chat sessions to prevent the model from applying any “learning” to subsequent cases. We pasted them along with our prompts into the Chat GPT 4.0 and Gemini Advanced models. Initially, a minimal clinical description was provided; if inaccurately diagnosed, a more comprehensive description was used. For each step, we asked the models to give a diagnosis and a list of differential diagnoses based on probability. The aim was to assess and compare the diagnostic accuracy of the two models under varying information conditions.

Results: Of 134 evaluated cases, 29 were excluded due to insufficient data. With case images and limited clinical information (step 1), ChatGPT 4.0 accurately identified the correct diagnosis in 38.1% of cases, compared to 26.6% provided by Gemini Advanced. When provided with detailed clinical information (step 2), ChatGPT 4.0's accuracy improved to 46.1%, compared to 32.4% for Gemini Advanced. ChatGPT 4.0 also demonstrated superior performance in differential diagnosis, correctly including the right diagnosis 82.8% of the time, versus 64.7% for Gemini Advanced. When the correct diagnosis was provided in the differential diagnoses list but not as the top diagnosis, the average rank of the correct diagnosis in step 1 was 3.6 for ChatGPT 4.0 and 2.5 for Gemini Advanced. In contrast, in step 2, it was 2.7 for ChatGPT 4.0 and 2 for Gemini Advanced.

Conclusion: ChatGPT 4.0 outperformed Gemini Advanced in diagnosing complex hematologic cases, both as a primary and differential diagnosis tool. While both models show promise in supporting hematologic diagnostics, their reliability for clinical application requires further investigation and refinement.

Disclosures

Yacoub:Blueprint Medicine: Consultancy; Apellis: Consultancy; GSK: Consultancy; Karyopharm Therapeutics INC: Consultancy; AbbVie: Consultancy; Servier: Consultancy; Novartis: Consultancy; Pfizer: Consultancy; Pharmaessentia: Consultancy; CTI Pharma (SOBI), Stemline Therapuitics: Research Funding; Incyte, CTI Pharma (SOBI), Pharmaessentia, Pfizer (Feb 22), Novartis, Servier, ABBVIE, Karyopharm Therapeutics INC , GSK, Blueprint Medicine, Apellis, Gilead, Notable Labs, Protagonist: Consultancy; Gilead: Consultancy; Notable Labs: Consultancy; Protagonist: Consultancy; CTI Pharma: Consultancy.

This content is only available as a PDF.
Sign in via your Institution